Clustering based on genetics algorithm

نویسنده

  • Dariusz Mazur
چکیده

Genetic Algorithms (GAs) have been fairly successful at solving problems of this type that are too ill-behaved (such as multi modal and/or non-differentiable) for more conventional hill-climbing and derivative based techniques. They are not guaranteed to find the global optimum solution to a problem, but they are generally good at finding acceptably good solutions to problems acceptably quickly. There are controlled by several inputs, such as size of population, ways to encode a potential solution as a chromosome, choice of modification operators. Such of these choices are better suited to a particular problem than others, and no single choice is the best for all problems. GAs have had a great measure to success in search and optimization problems. The reason for a great part of this success is their ability to exploit the information accumulated about an initially unknown search space in order to bias subsequent searches info useful subspaces, i.e., their adaptation. This paper introduces an evolutionary algorithm of clustering based on decision list. There has been several proposals of genetic operators designed particularly for rule discovery. Although these genetic operators have been used mainly in the classification task, in general they can be also used in other tasks that involve rule discovery, such as dependence modeling. Mutation is a common reproduction operator used for finding new points in then search space to evaluate. When a chromosome is chosen for mutation, a random choice is made of some of the genes of the chromosome, and these genes are modified. It is proposed to introduce certain variant of mutation, which is based on random choosing two elements from the list and swapping them (there is sort of permutation). The observed feature of algorithm was used in order to increase efficiency of such mutation. Part of then rules list is inactive because during transcribing process only leading rules are taken into consideration and there is possibility to define which rule is the last and divides then list into two parts: active and inactive. The order of the rules in the second, inactive part does not matter for the transcribing process since these rule are not participate in the process. In order to use this characteristic the mutation function guarantees that one of the chosen element always comes form active part of the list. The key difference between this operator and classic mutation operator is the information which each attempts to preserve during recombination. For the clustering problem the important information would seem to be the adjacency information. This operator explicitly preserves adjacency and relative order information. Information about absolute positions appears to be relatively unimportant.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Clustering Approach by SSPCO Optimization Algorithm Based on Chaotic Initial Population

Assigning a set of objects to groups such that objects in one group or cluster are more similar to each other than the other clusters’ objects is the main task of clustering analysis. SSPCO optimization algorithm is anew optimization algorithm that is inspired by the behavior of a type of bird called see-see partridge. One of the things that smart algorithms are applied to solve is the problem ...

متن کامل

Modified Convex Data Clustering Algorithm Based on Alternating Direction Method of Multipliers

Knowing the fact that the main weakness of the most standard methods including k-means and hierarchical data clustering is their sensitivity to initialization and trapping to local minima, this paper proposes a modification of convex data clustering  in which there is no need to  be peculiar about how to select initial values. Due to properly converting the task of optimization to an equivalent...

متن کامل

An Optimization K-Modes Clustering Algorithm with Elephant Herding Optimization Algorithm for Crime Clustering

The detection and prevention of crime, in the past few decades, required several years of research and analysis. However, today, thanks to smart systems based on data mining techniques, it is possible to detect and prevent crime in a considerably less time. Classification and clustering-based smart techniques can classify and cluster the crime-related samples. The most important factor in the c...

متن کامل

An improved opposition-based Crow Search Algorithm for Data Clustering

Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...

متن کامل

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

Tabu-KM: A Hybrid Clustering Algorithm Based on Tabu Search Approach

  The clustering problem under the criterion of minimum sum of squares is a non-convex and non-linear program, which possesses many locally optimal values, resulting that its solution often falls into these trap and therefore cannot converge to global optima solution. In this paper, an efficient hybrid optimization algorithm is developed for solving this problem, called Tabu-KM. It gathers the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004